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1 .  INTRODUCTION 


The  effectiveness  of  millimeter  wave  (mm-wave)  guided 
weapon  systems  may  be  limited  by  the  ability  to  adequately 
perform  the  target  acquisition  function.  Present  nun-wave 
technology  provides  for  the  detection  of  moving  targets  in 
clutter  and  of  stationary  targets  in  low  clutter.  Only 
limited  classification  is  currently  attainable;  basically, 
classification  with  mm-wave  sensors  is  limited  to  moving 
versus  stationary  targets  and  ground  versus  air  targets. 
There  is  a  need  for  a  mm-wave  target  acquisition  system  that 
provides  for  the  detection  of  both  moving  and  stationary 
targets  in  all  clutter  environments  and  that  also  provides 
for  classification  to  the  recognition  level,  with  further 
classification  to  the  identification  level  (IFF)  being 
highly  desirable. 

It  is  believed  that  the  desired  target  acquisition 
capability  may  be  achieved  via  the  implementation  of  a 
multiple  discriminant  system.  Multiple  discriminant 
processing  is  based  on  the  premise  that  effective 
classification  can  be  achieved  by  the  logical  and/or 
statistical  processing  of  reasonably  independent  sets  of 
radar  discriminants.  Such  a  system,  is  depicted  in  Fi gure  1 . 
A  millimeter  radar  is  used  to  obtain  returns  dependent  on 
target/clutter  characteristics.  These  are  then  processed, 
usually  with  respect  to  time,  to  produce  derived 
discriminants  such  as  doppler  spectra,  cross  polarization, 
etc.  These  discriminants  are  then  processed  to  obtain 
numbers  representing  the  physical  characteristics  of  the 
target,  such  as  size,  velocity,  etc.  The  job  of  the 
classifier  is  (1)  to  compare  the  features  with  those  of 
training  targets  stored  in  the  classifier  memory,  and  (2)  to 
reach  a  decision,  ideally  based  on  exact  target 
identification. 

Dasarathyl  has  made  a  simulation  study  of  three  basic 
classification  techniques:  (1)  the  maximum  likelihood 
classifier;  (2)  the  nearest  neighbor  classifier,  and  (3)  the 
linear  classifier.  Each  technique  was  shown  to  be 
potentially  useful. 


T. B.V.  Dasarathy,  TARECS:  A  Target  Recognition  System  for 
Identification  of  Land  Targets  in  Combat  Environments 
Using  Millimeter  Sensors  -  A  Simulation  Study,  M&S 
Computing,  Inc.,  Report  No.  T-CR-  78-20,  September  1978. 
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Radar  classification  system 


The  purpose  of  this  report  is  to  delineate  explicitly 
the  implementation  of  these  three  classification  techniques 
for  radar  systems  for  target  classification  and,  further,  to 
assess  the  applicability  of  microprocessors  to  these 
classifiers.  Section  2  of  this  report  presents  a  brief 
review  of  nomenclature  relating  to  classifiers,  after  which 
a  report  section  is  devoted  to  each  of  the  three 
classifiers. 

The  maximum  likelihood  classifier  is  presented  in 
Section  3;  it  is  shown  that  the  assumption  of  Gaussian 
distributions  of  features  results  in  low  memory  storage 
requirements  and  small  computational  loads.  In  Section  4, 
the  basic  nearest  neighbor  technique  is  shown  to  be 
simplistic  in  implementation  but  it  results  in  high  memory 
and  computational  requirements.  The  linear  classifier, 
discussed  in  Section  5,  is  found  to  lie  somewhere  between 
the  previous  classifiers  in  terms  of  assumptions  and 
computation. 

In  Section  6  an  effort  is  made,  through  an  example,  to 
assess  the  implementation  of  each  classifier  using  available 
microprocessors.  Section  7  presents  an  overall  summary  of 
the  report  and  gives  conclusions  drawn  from  the  analysis. 

2.  CLASSIFICATION  TECHNIQUES 

The  three  statistical  classification  techniques  studied 
in  this  report  use  pattern  recognition  or  statistical 
hypothesis  testing  in  algorithms  which  lend  themselves  to 
implementation  in  field  system  mircroprocessors .  The 
computing  power  and  memory  storage  capabilities  of  current 
microprocessor  chips  are  well  suited  for  complex  analyses  of 
target  signatures. 

Statistical  techniques  can  be  divided  into  two  classes: 
parametric  and  nonpar ame trie.  Parametric  techniques  require 
a  knowledge  of  the  probability  distribution  of  the  target 
signature  and  that  this  distribution  be  mathematically 
defined.  For  example,  consider  the  radar  cross  section  (RCS) 
of  a  given  target.  The  RCS  measured  is  a  function  of  aspect 
angle,  and  the  observed  aspect  angle  (under  battlefield 
conditions)  will  have  some  probability  distribution.  If  one 
could  measure  the  RCS  of  various  targets  in  a  combat 
scenario  and  obtain  enough  data,  the  RCS  probability  density 
of  each  target  class  could  be  plotted  and  a  mathematical 
representation  be  found.  Ideally,  the  probability  density 
would  be  mathematically  simple,  such  as  the  Gaussian 
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distribution.  Experimental  data  indicate  that,  for  the 
example  of  RCS ,  this  is  not  likely  to  be  the  case. 

Parametric  techniques  lead  to  the  highest  reliability 
of  classification  but  also  require  the  most  knowledge  of 
targets  and  that  the  knowledge  result  in  well-defined 
probability  distributions.  Nonparame trie  techniques,  which 
do  not  specifically  require  knowledge  of  the  probability 
distributions,  are  useful  when  target  signature  statistics 
are  highly  complex  or  cannot  be  reliably  defined.  This  is 
the  situation  most  likely  in  the  real  world.  Given  an 
initial  set  of  training  data  for  which  the  targets  are 
labeled,  a  nonparame  trie  algorithm  determines  the  most 
likely  target  type  for  a  given  input.  It  should  be  mentioned 
that  error  analysis  (decision  reliability)  is  possible  for 
parametric  techniques,  whereas,  for  nonparame  trie 
techniques,  only  empirical  error  analysis  is  feasible. 

Use  of  statistical  pattern  recognition  techniques 
usually  implies  that  the  fundamental  data  is  directly 
related  to  identifiable  physical  characteristics.  There  is 
another  class  of  techniques,  syntactic  pattern  recognition 
techniques,  in  which  a  target  signature  is  broken  down  into 
a  combination  of  substructures  or  primitives  for  which  the 
relationships  to  physical  characteristics  are  not  readily 
apparent.  For  an  unknown  target,  the  classifier  searches  for 
these  substructures  and  uses  algorithms  (grammar  rules)  to 
test  for  various  target  types. 

3.  THE  MAXIMUM  LIKELIHOOD  CLASSIFIER 

The  maximum  likelihood  classifier,  or  Bayesian 
classifier,  requires  a  probabilistic  description  of  the 
target  signature,  i  .e . ,  it  is  a  parametric  technique.  Any 
target  signature  which  is  aspect  angle  dependent  can  in 
principle  be  applied  to  this  technique,  so  that  the 
probability  distribution  contains  both  the  angle  dependence 
of  the  target  signature  and  the  probability  of  aspect  angle. 
In  practice,  complex  targets  will  not  have  signatures  which 
vary  smoothly  with  aspect  angle. 

As  an  example  of  this  classification  technique,  consider 
the  radial  velocity  distribution  of  moving  targets  obtained 
from  doppler  shift  frequencies.  Assume  that  a  sufficient 
quantity  of  training  data  is  available  and  that  the  data 
fortuitously  can  be  fitted  by  a  Gaussian  distribution.  Use 
of  the  Gaussian  (or  Normal)  distribution  simplifies  the 
analysis,  because  the  data  are  completely  described  by  two 
parameters,  the  mean, n  ,  and  the  standard  deviation,  a  .  We 
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consider  two  target  types,  tanks  and  jeeps,  with  tanks 
moving  more  slowly  than  jeeps.  Assume  the  following  data: 

Mean  velocity  of  tanks  =  pT  =  16  mph 

Standard  deviation  of  tank  velocity  =  cT  ■  5  mph 


Mean  velocity  of  jeeps  =  y  =  35  mph 

J 

Standard  deviation  of  jeep  velocity  =  a  ■  10  mph 

J 

The  Gaussian  probability  density  function  is 


-  (v-y)  2/2o2 
e 


(1) 


and  this  function  is  plotted  for  the  assumed  data  in  Figure 
2.  The  functions  plotted  in  Figure  2  are  known  as  a 
posteriori  conditional  probability  densities,  because  they 
indicate  the  probability  of  a  target  velocity  being  <v, 
given  that  the  target  is  known.  We  denote  these  by 


p(v/T)  =  - - - e'(v_y,P  /2aT 

V2tT  ot 


(2) 


p(v/J)  = 


-(v-yj) 


(3) 
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VELOCITY  (mph) 


Figure  2.  Velocity  probability  densities 


If  we  take  a  measurement  on  an  unknown  target  and  obtain 
velocity  v  (here  v  is  a  number  -  not  a  variable),  what  we 
would  like  to  know  is:  (1)  the  conditional  probability, 

P(T/v )  ,  that  the  target  is  a  tank,  given  that  the  velocity 
is  v,  and  (2)  the  conditional  probability  that  the  target  is 
a  jeep,  P(J/v),  given  that  the  velocity  is  v.  Based  on  these 
probabilities,  we  can  then  make  a  decision  whether  to  fire  a 
weapon.  For  example,  if  P(T/v)  >  P  (J,v),  then  we  will  shoot 
at  the  target. 

Additional  information  that  we  get  from  the  training 
data  are  the  unconditional  a  priori  probabilities,  P(T)  and 
P(J).  P(T)  is  the  probability  that  any  given  target  in  the 
field  is  a  tank;  P(J)  is  the  same  for  a  jeep.  (We  maintain 
the  simplification  that  there  are  only  two  types  of 
targets.)  For  this  example,  we  assume  that  the  average  ratio 
of  tanks  to  jeeps  in  all  the  battles  we  have  monitored  is 
three  to  one.  Thus  P(T)  =  .75,  and  P(J)  =  .25.  [Note  that 
P ( T )  +  P(J)  =  1.] 

Finally,  we  define  the  unconditional  probability  that 
any  velocity  measurement  we  make  yields  the  value  v;  this  is 

p(v)  =  p  (v/T)  P ( T )  +  p(v/J)  P(J).  (4) 

The  Bayes’  Theorem  is  then  used  to  compute  the  P(T/v)  and 
P(J/v) .  The  Bayes'  Theorem  for  tanks  is 


P(T/V),  pWP-ICO 

p(v) 


(5) 


and  for  jeeps  is 


p(J/v)  _  Pfo/J)P(J)  •  (6) 

The  decision  rule  is:  We  fire  if 

P(T/v )  >  P  (J/v)  .  (7) 
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Using  Equations  (5)  and  (6),  this  becomes 


P  (v/T)  P (T)  >  P(v/J)  P(J)  ,  (8) 

where  the  p(v)  terms  in  the  denominators  have  cancelled. 

We  can  extend  this  treatment  to  N  target  types  denoted 
by  Aj_  /  i  ~  1/2  ...N. 

Equation  (4)  becomes 


N 

p(v)  =  ^  P(v/Ai)  P(A1) 
i=l 


(9) 


We  compute  all  the  p(v/A^)  P(A^),  and  the  target  is 
classified  as  Aj  if 


P(v/A  )P(A  )  >  p(v/Ai)P(A1)  for  all  1  /  j. 


(10) 


For  the  Gaussian  distribution,  the  computation  is 
simplified  by  taking  the  natural  log  of  both  sides  of 
inequality  (8),  giving  the  decision  to  fire  if 


(v-Pj)2  /CTj2  -  (v-uT)2  /aT2  >  c 


(11) 
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where 


C  =  2  In 


(12) 


|oj  P(J)  /ot  P(T)| 


We  can  also  introduce  a  predetermined  "cost  of  decision" 
into  the  maximum  likelihood  classifier  by  defining  r^j  as 
the  cost  (O^rij  <  1)  of  classifying  the  target  as 
when  it  is,  in  reality,  A j .  For  example,  rTj  would  be 
the  cost  incurred  by  classifying  a  jeep  as  a  tank.  This  cost 
would  be  higher  than  correctly  identifying  the  jeep  (rjj), 
since  ammunition  is  wasted  on  a  non-threat.  Since 
identifying  a  tank  as  a  jeep  could  be  fatal,  rjT  would  be 
very  high. 

For  N  targets  (A^,i  =  1,  2,  ...N),  it  can  be  shown2 
that  the  cost  of  a  decision  is  minimized  if  the  target  is 
identified  as  Aj  by  determining  the  minimum  value  of  , 
i .  e . 


R.  <  Ri  for  all  i  ^  j  ,  (13) 


where 


N 

=  £  P(Aj)  ry  p(v/A_.)  .  (14) 


For  our  examples  of  tanks  and  jeeps, 

Rj  -  KOrjjPfv/T)  +  P(J)rTJP(v/J)  (15) 


2.  H.L.  Van  Trees,  Detection,  Estimation,  and  Modulation  - 
Part  I,  John  Wiley  &  Sons,  Inc.,  New  York,  1968. 


and 


Rj  =  P(T)rJTp(v/T)  +  P(j)rjJP(v/j). 


(16) 


From  Equation  (13),  the  decision  is  made  to  fire  if 


P(T)rnp(v/T)  +  P(J)rTJP(v/J)<  P(T)rJTp(v/T  +  P(J)rjJP(v/j)  (17) 


or  if 

P(T)p(v/T)  (rJT  -  r^)  >  P(J)p(v/J)  (rTJ  -  r^)  .  (18) 


If  tanks  and  jeeps  had  the  same  threat  potential,  we  would 
assign  vjt  =  r*pj  =  1  and  r<p»p  *  Equation  (18) 

then  reduces  to  the  previous  decision  rule  given  by  Equation 

(8). 

Now  consider  the  possibility  of  using  two  target 
signatures  in  a  maximum  likelihood  classifier,  say  radial 
velocities,  as  before,  and  radar  cross  section  (RCS).  Again, 
we  measure  arbitrary  RCS's  of  tanks  (N  measurements)  and 
jeeps  (M  measurements)  under  battlefield  conditions  and 
obtain  mean  values. 


WT'  -  s  £  <»CS\  •  -  K  £  <tcs\  ■  <w> 


and  standard  deviations 
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We  assume  that  the  joint  probability  density  is  given  by  the 
bivariate  Gaussian  distribution,  where  the  correlation 
between  velocity  and  RCS  is  zero; 3  that  is,  the  velocity 
and  RCS  are  independent  in  the  probability  sense.  The 
conditional  joint  probability  can  then  be  written  as  the 
product  of  two  univariate  Gaussian  distributions,  namely, 


p(v,RCS/T) 


-(v-y  )2/2aT2  -  (RCS  -  yTV/2aT 
e  e 


.  2 


2ira„a^ 
T  T 


(21) 


and 


p(v,RCS/J) 


x  -(v-yJ)2/2oJ2  -(RCS-yj  )2/2aJ  2 

-  e  e 

2TOja/ 


(22) 


This  result  can  be  extended  to  any  number  of  target 
signatures  as  long  as  they  are  statistically  independent.  If 
the  signatures  are  not  independent,  the  multivariate 
Gaussian  distribution  must  be  used,  and  the  covariances  and 
correlations  must  be  determined.  For  the  case  of  the  two 
signatures,  velocity  and  RCS,  assumed  independent,  the 
decision  rule  iss  Fire  if 


XI  Alexander  M.  Mood  and  Franklin  A.  Graybill,  Introduction 
to  the  Theory  of  Statistics,  McGraw  Hill  Book  Company, 
Inc.,  New  York,  1963.  ’ 


L3 


where 


yik  is  the  mean  of  the  ith  signature  a^  for  target 
type  A*, 

°ik  is  the  standard  deviation  of  the  ith  signature 
for  target  type  Afc,  and 


M 

IT 


i«l 


V 


M 


Then  Equation  (25)  becomes 


N 


£  P(Ak)i 


k-1 


-(a  -y  . )2/2a  2 

,  P  pk  pk 


N 


<  £  p<VriJ 

A-l 


/m  Y1  M  _(a  -v  «> 


2/*v2 


(27) 


The  flow  diagram  of  Fi qure  3  shows  how  a  maximum 
likelihood  classifier  might  be  implemented. 

4.  THE  NEAREST  NEIGHBOR  CLASSIFIER 

\ 

\ 

\ 

The  nearest  neighbor  classifier  is  the  least  complicated 
(most  straightforward)  of  the  three  classification 
techniques  but  also  involves  the  most  memory  storage  and 
computational  load.  It  is  a  nonparame trie  technique  in  which 
all  the  training  data  is  stored  and  used  in  classification, 
as  opposed  to  the  maximum  likelihood  method,  where  all  the 
training  data  is  reduced  to  three  numbers  per  target  type 
for  each  target  discriminant  (mean,  standard  deviation,  and 
a  priori  probability). 
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Figure  3.  Maximum  likelihood  classifier 
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Considering  the  same  example  as  in  the  previous  section, 
we  have  tanks  and  jeeps  with  radial  velocity  as  a 
discriminant.  The  data  stored  in  the  memory  consists  of 
measured  velocity  values  and  the  target  observed  for  each 
value.  Consider  the  following  data  taken  at  random  in  a 
hypothetical  battlefield  scenario: 

Tank  Velocities  (mph)  Jeep  Velocities  (mph) 


Vl  =  2 
V2  =  5 
.  6 

9 

10 
11 

14 

15 

16 

17 

18 


Vl6  "  19 
.  30 

35 

.  40 

V20  =  50 
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23 

24 


Vi5  =  25 


These  numbers  were  chosen  to  somewhat  resemble  the  Gaussian 
distributions  of  Figure  2.  In  each  memory  location,  we  store 
the  measured  velocity  v^  and  the  associated  target  type, 

Afc.  For  an  unknown  target  with  velocity  v,  we  compute  all 
| v  -  Vi I  and  search  for  the  minimum.  If 
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I  v  -  v  j  |  <  |v  -  VjJ  for  all  i  +  j  ,  (28) 

the  target  is  identified  as  the  one  associated  with  v  j . 

For  example,  if  v  =  27  mph,  the  "nearest  neighbor"  would  be 
the  tank  at  25  mph.  For  v  =  20  mph,  the  nearest  neighbor  is 
a  jeep  at  19  mph.  This  is  the  classical  NN  (nearest 
neighbor)  rule.  If  we  think  about  all  of  the  target  data 
being  plotted  on  a  velocity  line,  the  nearest  neighbor  is 
the  data  point  closest  to  the  unknown.  The  a  priori 
probabilities  are  taken  into  account  by  the  number  of  tank 
neighbors  being  larger  than  jeep  neighbors  by  three  to  one. 

Clearly,  a  lot  of  data  and  high  precision  (several 
significant  figures)  are  desirable.  Also,  one  must  make 
provisions  for  several  targets  having  the  same  velocity.  To 
avoid  this  problem,  one  can  invoke  the  k-NN  rule.  If  k  is, 
for  example,  11,  the  11  nearest  neighbors  of  the  unknown 
target  would  be  found.  The  unknown  would  be  assigned  to  the 
majority  class  of  these  nearest  neighbors?  if  6  or  more 
nearest  neighbors  were  tanks  and  5  or  less  were  jeeps,  the 
unknown  would  be  classified  as  a  tank. 

One  can  inject  the  threat  potential  assessment  into  the 
computation  by  assigning  a  constant  to  each  target  type, 
v^.  Since  tanks  are  a  greater  threat  than  jeeps,  we  might 
assign  tanks  a  factor  of  rT  =  .8  and  jeeps  a  factor  of 
rj  =  1.  The  decision  rule  (28)  becomes 

fj  |v  -  v  j  |  <  r^  |v  -  VjJ  for  all  i  *  j.  (29) 


The  net  effect  is  to  make  tanks  appear  to  be  nearer 
neighbors  than  they  actually  are. 

Also,  one  can  set  thresholds  for  decisionmaking,  below 
which  no  decision  is  made.  In  the  classical  NN  rule,  one  can 
require  that  the  nearest  neighbor  be  "close  enough"  to  the 
unknown.  For  the  example  given,  the  threshold  might  be  set 
at  |v  -  v j |  =2  mph.  If  the  nearest  neighbor  were  more 
than  2  mpn  away,  the  unknown  would  not  be  classified.  In  the 
k-NN  rule,  k  =  11,  one  might  require  more  than  a  majority, 
say  7  out  of  11,  for  a  decision. 

Another  technique  which  can  be  used  is  the  weighted  k  - 
NN  rule,  in  which  the  k  nearest  neighbors  are  weighted 
inversely  proportional  to  their  distance  from  the 
unclassified  target.  For  example,  we  classify  the  target  as 
a  tank  if 
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(30) 
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JEEPS, 

rj  !v-vj! 

Provisions  must  be  made  for  the  computational  problems 
resulting  from  |v  -  v^  I  =  0. 

The  utility  of  the  NN  classifier  becomes  more  apparent 
when  extended  to  more  than  one  target  signature.  If  there 
are  M  target  signatures  (features),  instead  of  being  a  point 
on  a  line,  each  training  target  measurement  becomes  a  point 
in  M-dimensional  feature  space.  Nearest  neighbors  are  then 
determined  by  their  distance  in  feature  space  from  the 
unknown. 

Consider  the  two-dimensional  classifier  using  target 
extent  in  elevation,  z,  and  azimuth,  y,  determined  from 
angular  profile  and  range  data.  Plotted  in  two  dimensions, 
typical  training  data  might  resemble  Figure  4.  For  this 
case,  the  decision  rule  is 


dj  <  dj 


for  all  1  4  j 


(31) 


or 


V 


<y-y/  ♦ 


(Z-Zj) 


V 


(y-yjV 


(z-zi)‘ 


(32) 


Although  it  is  not  readily  apparent  from  this  example,  this 
simple  formulation  is  flawed  in  that  it  tends  to  weight  one 
feature  more  heavily  than  another.  For  example,  targets  are 
generally  longer  than  they  are  high,  but  not  necessarily 
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Figure  4.  Target  extent  training  data 


wider.  Thus,  because  of  variations  in  yaw  aspect  angle,  we 
expect  a  large  range  of  values  for  the  and  a  small 

range  for  the  .  Thus,  the  (y-yf)^  terms  will  dom¬ 
inate  over  the  smaller  (z-zi)^  terms.  This  disparity 
becomes  more  visible  i  f  we  use  as  features:  (1)  the  radial 
velocity,  which  perhaps  varies  between  ~  5  mph  and  ~  50  mph; 
and  (2)  RCS  which  might  vary  between  5  m2  and  500  m2,  if 
we  use  Equation  (32),  the  RCS  will  dominate  over  the 
veloci ty. 

Clearly,  some  form  of  "normalization"  depending  on  the 
spread  of  data  will  probably  be  needed.  We  can  "normalize" 
by  dividing  each  feature  by  the  sample  standard  deviation 
for  the  target  signature  y  given  by 


where  n  =  total  number  of  training  samples  and  n y  is  the 
sample  mean  given  by 


n 

Py=;  E  yi  •  (34) 


The  decision  rule  then  becomes 


for  all  i  i  j .  (35) 


For  an  M-dimensional  feature  space  (M  types  of  target 
signatures),  x^i,  k  =  1,2  .  .  M,  where  i  denotes  the  ith 
target  and  x^  is  the  unknown,  the  decision  rule  is 
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for  all  i  ^  j.  (36) 


The  implementation  of  the  NN  classification  is  similar  to 
that  shown  in  Figure  2,  except  that  the  computation  loop  is 
traversed  n  times  (n  =  total  number  of  training  samples) 
instead  of  only  N  times  (number  of  target  classes). 
Modifications  needed  to  implement  the  k  -  NN  rule  and  the 
weighted  k  -  NN  rule  will  result  in  an  additional 
computational  load.  In  addition,  memory  requirements  have 
increased  to  nM  +  2MN  +  N,  where  nM  is  the  number  of  sample 
data,  2  MN  is  the  number  of  means  and  standard  deviations, 
and  N  is  the  number  of  weighting  (threat)  factors. 

It  should  be  pointed  out  that  there  are  many  variations 
of  the  NN  classifier  which  have  not  been  discussed.  These 
include  techniques  for  reducing  the  size  of  the  training 
data  set  and  computational  tricks  to  increase  processing 
speed.4 

5.  THE  LINEAR  CLASSIFIER 

With  the  maximum  likelihood  classifier,  a  complete 
knowledge  of  the  probability  distribution  of  the  target 
signatures  was  assumed,  and  this  assumption  resulted  in 
small  computational  loads.  In  the  NN  classifier,  essentially 
nothing  was  assumed,  and  memory  and  computation  requirements 
were  large.  The  linear  classifier  is  a  nonparame trie 
technique  with  assumptions  about  the  distribution  and  field 
computational  load^  lying  somewhere  between  the  extremes  of 
the  maximum  likelihood  classifier  and  nearest  neighbor 
classifier. 

The  assumption  made  in  the  linear  classifier  is  that  the 
target  classes  can  be  separated  in  multidimensional  feature 
space  by  discriminant  surfaces  called  hyperplanes.  To 
understand  what  the  previous  statement  means,  consider  the 


4.  B.V.  Dasarathy,  A  Study  of  Nearest  Neighbor 

Classification  Techniques  in  the  Context  of  Millimeter 
Radar  Target  Recognition  and  Selection  Applications,  M&S 
Computing,  Inc.,  Report  No.  78-017,  March  1978. 
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two  feature  case  of  Figure  5a  where  the  discriminants  are 
measured  target  extent  in  elevation  and  azimuth.  In  this 
case,  the  "hyperplane"  is  a  line.  We  assume  that  we  can  find 
the  equation  for  a  straight  line  that  separates  all  the 
tanks  from  all  the  jeeps,  so  that  any  unknown  can  be 
classified  on  the  basis  of  on  which  side  of  the  line  it 
falls.  In  fact,  we  assume  that  even  when  no  straight  line 
exists  that  separates  the  data,  we  can  apply  an  algorithm  to 
the  data  which  will  give  us  the  "best"  line,  i.e.,  the  line 
that  will  give  us  the  least  probability  of  misclassif ication. 

We  now  add  target  extent  in  range  to  elevation  and 
azimuth  extent,  so  that  we  have  a  three-dimensional  feature 
space,  as  shown  in  Figure  5b.  In  this  case  the  hyperplane 
is  a  plane  separating  the  targets  in  space.  If  we  ex¬ 
tend  this  treatment  to  M  types  of  target  signatures,  x^  ,  i 
=1,  2,  3,  .  .  .M,  the  discriminant  surface  is  a  "hyper¬ 
plane"  in  M-dimensional  space. 

Considering  now  the  treatment  of  three  or  more  target 
classes  (number  of  classes  =  N)  ,  we  have  the  option  of 
trying  to  find  a  discriminant  surface  for  each  class  which 
separates  that  class  from  all  others,  or  we  can  find 
surfaces  which  separate  each  pair  of  classes.  The  first  case 
requires  N  hyperplanes,  resulting  in  a  small  computational 
load  for  determining  unknowns,  but  the  linear  separability 
assumption  is  likely  to  be  less  valid. 

The  latter  case  requires  1/2  N  (N  -1)  hyperplanes  and 
more  computations  to  determine  unknowns.  However,  it  often 
gives  a  higher  accuracy,  because  a  given  target  class  is 
more  likely  to  be  linearly  separable  from  each  class 
individually  than  from  all  other  classes  lumped  together. 

The  latter  case,  however,  also  opens  up  the  possibility 
of  regions  of  feature  space  for  which  an  unknown  is  not 
uniquely  identifiable.  For  example,  consider  the  two- 
dimensional  feature  space  with  three  target  classes  and 
separating  lines  shown  in  Figure  6.  Any  unknown  falling  in 
the  cross-hatched  area  cannot  be  classified. 

The  Ho-Kashyap  algorithm  can  be  used  to  determine  the 
hyperplanes,5  and  this  algorithm  is  first  discussed  for 


TT  Yu-Chi  Ho  and  R.L.  Kashyap,  "A  Class  of  Iterative 

Procedures  for  Linear  Inequalities,"  J.  SIAM  Control, 
Vol.  4,  1966,  pp.  112-115. 
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Figure  6.  Ambiguity  in  two-dimensional  feature  space. 


two  target  discriminants:  extent  in  azimuth  (x)  and 
elevation  (y)  as  in  Figure  5a.  (The  x^  and  are 
normalized  as  before.)  We  wish  to  find  the  straight  line 
that  separates  measurements  obtained  from  m  tanks  x^,  yTi ,  . 

i  =  1,  2,  .  .  .  m,  and  from  n  jeeps,  xj j ,  yj-i»  j  =  1,  2, 

.  .  .  n.  The  line  equation  is  given  by 


ai  x  +  a2  y  +  a0  =  o 


(37) 


where  a^,  a 2#  and  a q  are  constants  to  be  determined 
from  an  iteration  procedure.  We  require  that  all  tank  data 
lie  outside  the  line. 


al  xTi  4  a2  yTi  +  aQ  >  0  for  all  i 


(38) 


and  that  all  jeep  data  lie  inside  the  line. 


ct.  x  .  +  +  a  ... 

1  J]  a2  yjj  0  <  0  for  all  j  . 


(39) 


Now  Equation  (39)  can  be  rewritten  as 


al  xJj  -  a2  Yjj  "  a0  >  0  for  all  j 


(40) 
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Thus  we  have  n  +  m  linear  equations  which  may  be  written  in 
matrix  form  as 


or  A  a  >  0  .  (42) 

We  introduce  the  variable  vector  3  =  CB, ,  B,  »  ...  B  ,  B  . , ,  ...  B  .  ) 
such  that  we  must  find  g  for  which 

A  a  -  B  =  0  (42) 

and 

B  >  0  .  (43) 

"The  introduction  of  the  vector  fi  as  an  additional  variable 
plays  a  crucial  role  in  the  convergence  rate  of  the 
algorithm  without  any  appreciable  increase  in  computational 
complexity."5  Before  convergence,  A  a  -  B  /  0,  so  we 
introduce  the  vector  7  defined  by 

Y  =  A  a  -  B  .  (44) 
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We  also  introduce  the  scalar  constant  p  and  the  symmetric 
matrix  S  (in  this  case  S  is  a  3  x  3  matrix) .  One  possible 
choice  of  p  and  S  are 


0  <  p  <  2 


(45) 


S  =  (ata)_1 


(46) 


where  AT  is  the  transpose  of  A, 


XT1 

XT2 

•'*  xTn 

-  XJ1 

”  XJ2 

> 

HJ 

II 

yTl 

yT2 

*•*  yTn 

‘  yJl 

(N 

>1 

1 

1 

1 

...  1 

-  1 

-  l 

and 


x. 

Jm 


yJm 

1 


(47) 


S  is  then  the  inverse  of  the  ATA,  which  is  found  from  the 
determinants  of  the  cofactors  and  the  matrix  determinant. 
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It  can  be  shown  that  the  algorithm 


(a)i  +  1  =  (a)i  +  Jl  s  I  Yi'  '  a0  arbitrary» 


(3)^  +  1  =  (3)  i  +  (Yi+|  Yj^l )/ 3Q>0but  arbitrary  otherwise  (50) 

converges  to  the  solution  of  Aa-P  =  0  in  a  finite  number 
of  steps  provided  a  solution  exists.  The  subscripts  here 
denote  the  number  of  iterations  performed.  Initially  we 
choose  P  ,  (a)0  =  {  (a^Q,  (a2)o»  (<*o)o  }  ,  and 

(6) o  =  {(fii)Of  ( 62 ) 0  r  +  •  .  .  (3n+m)o  }  ,  plug 
these  into  Equation  (44)  and  compute 


Y0  =  A  (a)  0  -  (3)  0 


Then,  using  Equations  (49)  and  (50),  compute 


(a)x  =  Ca)Q  +  p  S  A1  |  Yq  | 


(&)1  =  (3) 0  +  (Y0  +  Iy0 | )  . 


Returning  to  Equation  (44),  we  find  Yj  s 


Yx  =  A  (a)  1  -  (3)  x 
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and  continue  the  precedure  until  Yi+0. 


Now  if  there  is  no  solution  (i.e.,  the  tank  and  jeep 
data  overlap  so  that  the  discriminant  line  does  not  exist), 
we  can  continue  the  iteration  process  until  the  (a)i  do 
not  change  very  much  from  one  iteration  to  the  next.  For 
example,  we  might  stop  at  the  76th  iteration,  provided  we 
have  a  one  percent  accuracy: 


(c.)  76  -  (a.,) 

75  ' 


75 


<  .01  for  j  =  0,1,2 


(55) 


where  the  a  j  are  the  components  of  the  vector  a. 

It  is  clear  that  even  for  the  case  of  the  two  target 
types  and  two  discriminants,  the  computation  of  the  aj, 
a  2 /  and  oq  for  Equation  (37)  is  quite  involved  and 
normally  needs  to  be  done  on  a  main  frame  computer.  However, 
once  these  three  constants  are  determined,  they  are  all  that 
needs  to  be  stored  in  the  field  microprocessor.  If  we 
measure  x  and  y  for  an  unknown,  the  target  is  identified  as 
a  tank  if 


«i  x  +  «i  y  +  a0  >  0 


(56) 


Now  consider  the  problem  for  two  target  types  and  M 
types  of  target  signatures  x^ ,  i  =  1,2,  .  .  .  M.  The 
computation  of  the  M  +  1  constants  (a^,  02 ,  .  .  .  am, 
ao)  is  considerably  more  difficult  because  the  vectors 
involved  now  have  M  +  1  components  and  the  matrix  S  is  now 
(M  +  1)  X  (M  +  1).  The  computational  and  storage 
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requirements  for  the  field  processor  are  not  significantly 
altered,  however.  The  decision  rule  becomes 


ai  xi + 


“2  *2  + 


a  x 
m  m 


+  a~  >  0 


(57) 


Now  consider  the  extension  to  N  target  classes.  There 
will  be  a  plane  for  each  of  the  1/2  (N)  (N  -  1)  target 
pairs,  so  that  we  must  store  1/2  (M  +  1)  N  (N  -  1)  constants 
in  the  field  microprocessor  and  must  make  between  N  -  1  and 
1/2  N  (N  -  1)  computations.  A  flow  diagram  showing  how  the 
microprocessor  might  be  programmed  is  shown  in  Figure  7. 

6.  MICROPROCESSORS  FOR  TARGET  CLASSIFICATION 

One  conclusion  drawn  in  a  recent  study  of  target 
classification6  is  that  the  use  of  multiple  discriminants 
is  likely  to  be  essential  to  reliable  classification.  The 
three  independent  classification  techniques  discussed  all 
appear  applicable  to  multiple  discriminant  analysis.  The 
nonparametric  techniques  are  straightforward  in  their  use  of 
the  training  data.  In  the  maximum  likelihood  classifier,  one 
would  ideally  use  extensive  analysis  to  find  exact 
expressions  for  the  probability  densities.  However,  the 
basic  assumption  that  the  distributions  are  Gaussian  is  more 
credible  than  it  might  first  appear  because  of  the  central 
limit  theorem.  For  practical  applications,  the  importance  of 
this  theorem  lies  in  the  fact  that  the  mean  of  n  random 
samples  from  any  distributuion  with  mean  V  and  standard 
deviation  a  approximates  a  Gaussian  distribution;  i.e.,  the 
mean  is  distributed  as  a  normal  variate  with  mean  u  and 
standard  deviation  a|>/nT 

While  each  of  the  classifiers  appears  usable,  the 
question  remains,  which  one  is  best?  Clearly  the  answer 
depends  on  the  actual  distribution  of  the  data. 


6. Robert  G.  Shackelford  and  James  J.  Gallagher,  Isolation, 
Classification  and  Location  of  Targets  with  Millimeter 
Systems,  Contract  Report  for  US  Army  Missile  Research 
and  Development  Command,  Advanced  Sensors  Directorate, 
1978. 
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Dasarathy,-*-  in  a  preliminary  effort,  has  compared  the 
three  techniques  using  artificially  generated  data  for 
target  extent  in  azimuth,  elevation,  and  range.  He  found  no 
significant  differences  in  the  recognition  rates  for  the 
classifiers.  Naturally,  the  results  of  any  simulation  will 
depend  heavily  on  how  the  test  data  is  generated,  so  that 
actual  data  will  be  required  for  any  final  rating  of  the 
classifiers . 

To  assess  the  applicability  of  microprocessors  to  field 
classification,  consider  the  case  for  five  target  types  and 
five  target  discriminants.  Since  we  might  expect  that  no 
classification  improvements  result  from  training  data  less 
than  1  degree  apart  in  aspect  angle, ®  let  us  (assuming  some 
target  symmetry)  take  n,  the  total  number  of  training 
samples,  to  be  900,  or  an  average  of  180  samples  per  target. 
The  number  of  constants  stored  in  the  microprocessor  are  75 
for  the  linear  classifier,  85  for  the  maximum  likelihood 
classifier,  and  4,555  for  the  nearest  neighbor  classifier. 
The  point  of  these  results  is  that,  as  far  as  computation 
and  memory  requirements  are  concerned,  if  one  is  going  to 
use  the  linear  classifier,  one  might  as  well  also  use  the 
maximum  likelihood  classifier  (and  vice  versa) ;  and  if  one 
is  going  to  use  the  nearest  neighbor  technique,  one  might 
as  well  use  all  three  classifiers.  The  use  of  some  combin¬ 
ation  of  the  classifiers  would  seem  optimum,  since  if 
neither  of  the  faster  classifiers  (maximum  likelihood  and 
linear  classifiers)  gave  a  firm  decision,  the  NN  classifier 
would  be  available.  Otherwise,  one  could  use  a  majority 
rule,  or  a  weighted  majority  rule  if  the  classifiers  could 
be  rated  according  to  their  expected  performance. 

A  microprocessor  to  be  used  in  such  a  classification 
system  would  require  a  reasonably  large  permanent  memory  for 
the  storage  of  constants  and  computation  programs,  but  a 
relatively  small  temporary  memory.  Temporary  memory  is 
usually  referred  to  as  RAM  (random  access  memory),  which 
provides  immediate  access  to  all  memory  storage  locations. 
The  permanent  memory  is  called  the  ROM  (read  only  memory) 
which  may  be  programmed  by  a  mask  pattern  in  the  last 
manufacturing  stage  or  may  be  programmed  in  the  field  using 
suitable  equipment.  In  the  latter  case,  the  memory  is  called 
a  PROM  (programmable  read  only  memory) .  Program  data  stored 
in  the  ROM  cannot  be  altered  and  for  that  reason  is  often 
called  firmware.  Memory  storage  is  measured  in  bytes 
(computer  words)  usually  of  4  or  8  bits  (one  bit  =  one 
on-off  switch). 
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There  are  currently  available  single  board  microcom¬ 
puters  (Motorola,  Texas  Instruments,  Zilog,  etc.)  which  pro¬ 
vide  8  k  byte  PROM  capacity  (k  =  1,024)  and  from  512  to  4  k 
of  RAM,  all  at  a  nominal  cost  of  $300  (for  a  quantity  of 
100) .  The  PROM  capacity  would  appear  to  be  adequate  for  the 
example  given,  since  3,477  bytes  would  be  left  for  program 
instructions  after  the  storage  of  4,715  constants. 

7.  SUMMARY  AND  CONCLUSIONS 

This  report  addresses  the  subject  matter  in  a  highly 
fundamental  manner.  The  basics  of  the  three  classifiers  have 
been  presented,  but  no  attempt  has  been  made  to  consider 
other  classification  techniques  nor  to  expand  on  the  three 
classifiers  discussed.  An  obvious  extension  of  the 
computerized  classifier  is  one  which  continually  updates  the 
data  base  as  targets  are  successfully  identified  in  the 
battlefield.  It  has  been  shown,  however,  that  there  exist 
three  classification  techniques  which  can  be  easily 
implemented  to  experimental  data  from  mm  wave  radars. 

* 

To  implement  the  maximum  likelihood  classifier,  one 
makes  a  broad  assumption  about  the  training  data  and  boils 
the  data  down  to  very  few  numbers;  the  result  is  low  memory 
storage  and  very  few  computations  needed  for  classification. 
The  validity  of  the  Gaussian  assumption  is  questionable  but 
is  given  some  credence  as  a  result  of  the  nature  of  random 
variables . 

In  the  nearest  neighbor  classifier,  no  assumptions  are 
made,  and  all  of  the  training  data  is  retained.  Many 
repetitive  but  simple  computations  must  be  performed  to 
classify  a  target. 

As  a  result  of  extensive  preprocessing  of  the  data,  the 
linear  classifier  requires  low  memory  storage  and  has  a 
computation  load  intermediate  between  the  nearest  neighbor 
and  the  maximum  likelihood  classifier.  Although  the 
calculation  of  the  constants  representing  the  discriminant 
hyperplanes  is  complex,  the  use  of  these  constants  in  the 
field  classifier  is  straightforward. 

It  has  also  been  shown  that  currently  available 
microprocessors  have  sufficient  capabilities  to  implement 
the  classifiers  for  field  radar  systems.  It  is  felt  that 
the  small  size,  high  computing  power,  and  low  cost  of 
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microprocessors  will  dictate  their  use  in  millimeter  wave 
target  acquisition  systems. 

Finally,  it  is  concluded  that  a  program  needs  to  be 
carried  out  to  integrate  a  microprocessor  into  an  ex¬ 
perimental  mm  wave  radar  system  to  allow  an  evaluation  of 
the  various  target  signatures  and  classifiers.  Experimental 
target  signature  data  are  required  to  effectively  assess  the 
utility  of  the  classif ication  techniques. 
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